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ABSTRACT 

Context. Paper I of the series re-derived the radio interferometry measurement equation (RIME) from first principles, and extended 
the Jones formalism to the full-sky case, incorporating direction-dependent effects (DDEs). 

Aims. This paper aims to describe both classical radio interferometric calibration (selfcal and related methods), and the recent devel- 
opments in the treatment of DDEs, using the RIME-based mathematical framework developed in Paper I. It also aims to demonstrate 
the ease with which the various effects can be described and understood. 

Methods. The first section of this paper uses the RIME formalism to describe self-calibration, both with a full RIME, and with the 
approximate equations of older software packages, and shows how this is affected by DDEs. The second section gives an overview of 
real-life DDEs and proposed methods of dealing with them. 

Results. A formal RIME-based description and comparison of existing and proposed approaches to the problem of DDEs. 
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Introduction 



Paper I of this series (ISmirnovl 201 1 ah extende d the RIME for- 
malism (lHamaker et alj 119961: Hamakerl 120001) to the full-sky 
case, culminating in the following equation for the visibility ma- 
trix measured by interferometer pq: 



Methods: analytical - Methods: data analysis - Techniques: interferometric - Techniques: polari- 



Section[T]of this paper reviews the 2GC calibration problem, 
and shows how the RIME formalism can be (and has been) ap- 
plied. Section [2] then looks at the problem of DDEs, describes 
how they impact calibration, and discusses some current and fu- 
ture approaches. 



V pq = G p 



\ lm 

Bpq - EpBEq 



(1) 



1. Calibration and the RIME 

In the traditional (2GC) view, calibration refers to a process by 
which the instrumental errors are estimated and corrected for, 
imaging is the processes of turning the corrected visibilities into 
an image, followed by deconvolution to take out the effects of the 
point spre ad function. W hile algorithms such as Cotton-Schwab 
CLEAN (Schwab 1984) have blurred the boundaries between 
imaging and deconvolution, the separation between calibration 
and imaging is firmly entrenched in 2GC selfcal implementa- 
tions (where the two processes are typically implemented via 
completely separate tools), and has historically led to a diver- 
gence of the algorithm development community into "calibration 
people" and "imaging and deconvolution people". 

The RIME, and recent developments in understanding of 
DDEs, have been eroding this distinction. On the one hand, ad- 
vances in imag e reconstruction techniques (for an overview, see 
Rati et al . 2009) have been usurping some traditional functions 
of calibration, while new methods of source modell ing on the 
calibr ation side, such as the use of shapelets ( Yatawatt a et al.l 
1201 Oh . rely on increasingly elaborate models being constructed 
for a large part of the flux (with traditional imaging then only 
used for the lower-level residuals). In RIME terms, both pro- 
cesses should be thought of as two aspects of the same optimiza- 
tion problem: estimating B(/, m), E p (l, m) and G p in an equation 
such as (Q} that yield the best fit to a set of observed visibil- 
dimensional Fourier transform of the matrix function B app (/, m). ities ("data") D„„. Traditional selfcal solves for the direction- 



The B term is a 2 x 2 brightness matrix, describing the po- 
larized sky brightness as a function of direction I, m. The G p 
Jones matrices represent the per-antenna direction-independent 
effects (DIEs), which are the provenance of traditional second- 
generation calibration (2GC) techniques, most notably selfcal. 
The E p Jones matrices represent the direction-dependent effects 
(DDEs). 

DDEs violate the traditional premise of 2GC, which is that 
an interferometer array measures the Fourier transform of one 
"common" sky. Instead, in the presence of DDEs, each base- 
line sees its own apparent sky B pq . The traditional premise only 
holds when the DDEs are identical across all antennas, and con- 
stant in time: E p = E. Under this condition, the apparent sky 
becomes the same on all baselines: (B app = EBE H ), and the 
full-sky RIME becomes simply: 



Vp 9 - GpX pij Gq, 



where X 



pq 



X(u„ 



(2) 

? ), and the matrix function X(u, v), 



called the sky coherency, is the (element-by-element) two- 
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independent terms G p , traditional imaging yields the B(7, m), and 
the non-trivial DDEs E p (l, m) have (traditionally) been ignored. 
The historical calibration-imaging separation corresponds to a 
two-stage recursive optimization process. 

1.1. Implicit RIMEs 

Existing 2GC packages all make use of some implicit version 
of the RIME. It is useful to consid er at least one example in 
depth. In Paper III (Sm irnovll201 lbl) . I shall be comparing the 
results of a MeqTrees calibration using an explicit RIME to 
those obtained with the NEWSTAR package on the same data. 
NEWSTAR therefore makes for a perfect example. 

The exact form of the RIME implemented by NEWSTAR 
depends on the options usedQThe one relevant to the reductions 
of Paper III is: 



•pq 



= G p (U pq *X pq )G H q , 
~ ^ E s X spq 



(3) 



The constituent parts of this equation are as follows: 

X spq is the coherency of source s. NEWSTAR sky models are 
composed of discrete point sources or extended Gaussian 
components. For a point source, X spq - K p Q spq B s K^ . 

B s is the source brightness. This can be further parametrized in 
terms of Stokes IQUV, spectral index and rotation measure. 

Q spq is a per-source correction fact or to account fo r time and 
bandwidth smearing (see Paper I, Smirno vl2011aL Sect. 5.2). 

E s is the primary voltage beam gain. NEWSTAR uses an an- 
alytic approximation of the WSRT beam (see Sect. I2.1.1I ). 
This is implicitly treated as a trivial DDE, i.e. constant in 
time and the same across all stations. 

X pq is thus the "model visibilities", i.e. the sum of coherencies 
of all sources in the model. 

G p is a diagonal matrix of complex per-station gain terms. 

M pq is a 2 x 2 matrix of mul tiplicative interferometer errors (see 
Paper I. ISmirnovll201 lal Sect. 5.3), and "*" is element-by- 
element multiplication). Here it is on the inside of the equa- 
tion rather than on the outside as in Eq. (24) of Paper I: this 
is due to the way NEWSTAR uses "corrected data" in its 
selfcal procedure. 

NEWSTAR's calibration and imaging procedure typically 
consists of some combination of and/or iteration over the fol- 
lowing steps: 

Gain calibration: find G p that minimizes \G p X pq G q - D pq \ in 
a least-squares sense. Compute "corrected data" as D' pq = 



Closure errors: find M„„ that minimizes |M„„ * X 



Imaging and deconvolution: turn the R pq visibilities into an im- 
age, and deconvolve it using Hogbom CLEAN. 

Source finding: Perform a source finding procedure on the 
residual image to update the sky model. 

Model update: Solve for the parameters of the new sources by 
minimizing \D'p q -X pq \ (usually on a small subset of the data). 

Model restore: Add the sky model into residual images (after 
another calibration/subtraction cycle, if the model was up- 
dated), using a Gaussian restoring beam. 

Calibration procedures implemented by other 2GC packages 
may differ in detail, but are very similar in principle. The cru- 
cial common concepts are: (a) the use of an equation such as 
(0, which clearly separates the model visibilities (X pq ) from 
antenna-based errors (G p ), and (b) the procedure of correcting 
visibilities (whether on-the-fly or in storage) by applying the in- 
verse of the G p solutions. Both concepts break down when DDEs 
become involved, as will be discussed in Sect. [2] 



1.2. Explicit RIMEs 

An example of an explicit RIME is implemented in CASA. This 
also relies on the concept of model visibilities: 



^1 pq — J pX pq J q 



(4) 



Here, X pq is the model visibility (which may be computed 
from an image and/or a list of NEWSTAR-like components), 
and J „ is compose d of several different Jones terms, typically 
( Myer s etal.ll20 lOt Appendix E. 1 ) : 



J P = BpGpD p E p P p T p 



(5) 



Each term has its own specific implementation (in case of 
known terms) and parametrization (in case of solvable terms). 
Finally, multiplicative interferometer-based errors (M pq ) may 
be optionally applied t o either the outs ide of the equation (as 
per Eq. 24 of Paper I, ISmirnovll201 lal) . or to X pq itself (a-la 
NEWSTAR, see Eq.|3]above). 

Conceptually, calibration in CASA is very similar to the pro- 
cedure described in the previous section, but the use of an ex- 
plicit RIME confers several advantages. The known terms of the 
Jones chain (Eq.[5j can be taken into account properly, while the 
solvable terms can be solved for in different combinations. The 
caveats of using such a sp ecific form of th e RIME have already 
been discussed in Paper I (Smirnov1 l201 lal Sect. 6.2). 

Note that although CASA also relies on the essentially 
2GC-rooted concepts of model and corrected visibilities, 
the framework has been successfully used for the develop- 
ment of algorithms fo r calibration and corr ection of DDEs, 
name ly W-projection (ICornwell et all 120081) , pointing self- 
cal d Bhatnaga r et al . 2004) and AW-projection (Bhat nagaret alJ 
l2008h . I will discuss these further in Sect. [2] 



pqi- 



Compute "corrected data" as D" = D' -5- U pq (where is 1-3. Phenomenological RIMEs 



element-by-element division - the inverse of "*"). 
Model subtraction: Compute "residual data" as R pq = Dp q -X pq . 
H pq thus contains the visibility contribution of faint back- 
ground sources not present in the model, corrected for the 
estimated antenna gains and interferometer errors. 



1 The version of the NEWSTAR RIME covered here does not in- 
clude bandpass or polarization calibration. These options are available 
in NEWS TAR, but they w ere not used for the calibration described in 
Paper III f Smirnov 201iH). 



My experiments with calibr ation in MeqTrees have f avoured 
phenomenological RIMEs dNoordam & Smirnovll2010l) . Rather 
than writing out long Jones chains such as that of Eq. ©, which 
attempt to follow the physics of the signal propagation chain, the 
phenomenological approach consists of using a RIME with the 
minimum number of solvable terms needed to represent the cu- 
mulative effect of the chain. Each phenomenological term then 
ends up subsuming several different physical effects. The ratio- 
nale for this approach is that, on the one hand, we only need 
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to capture the overall effect for purposes of calibration, while 
on the other hand, the individual effects often cannot be distin- 
guished at all, apart from their different behaviour in time and 
frequency - which we try to capture with individual phenomeno- 
logical terms. 

For example, a full-polarization bandpass-gain calibration of 
the WSRT can be donq3 using the following phenomenological 
RIME: 

M pq = G p B p X pq B q G q 

Here, G p is a solvable diagonal complex matrix with rapid 
variation in time, and none in frequency. This subsumes an- 
tenna/receiver gains (G-Jones, in CASA nomenclature) and at- 
mospheric phase (T-Jones). B p is a solvable full 2x2 complex 
matrix with high variability in frequency, but little to none in 
time. This subsumes bandpass (fi-Jones), polarization leakage 
(D-Jones) and on-axis beam gain (/s-Jones). More real-life ex- 
ampl es of phenome nological RIMEs will be discussed in Paper 
III (ISirurnovfcOllbh . 

Where a specific Jones term is known from a priori consid- 
erations, it can and should be inserted into a phenomenological 
RIME. For example, the equation above would not be suitable 
for polarization calibration of the VLA because of parallactic 
angle rotation. The equation would need to be rewritten with an 
extra P-Jones term, which is not solved for, but rather computed 
analytically: 

Vp ? = G p B p P p X pq P q G q B q (6) 

One must be mindful of matrix (non)commutation when con- 
structing phenomenological RIMEs. The reason the full CASA 
Jones chain of Eq. (0 can be captured by the much simpler Eq. 
© is because som e Jones matrices do commute (see also Paper 
l TSmirno"vll20 1 lal Sect. 1.6). In particular, T-Jones is scalar and 
so commutes with everything, while the CASA B-Jones and G- 
Jones are diagonal and so commute among themselves. This al- 
lows us to rewrite Eq. (0 as: 

J p = (G p T p )(B p D p E p )P p , 

which makes the link to Eq. © obvious. 

To give a counter-example, in the presence of significant 
Faraday rotation (time-variable or differential, see Sect. 12.2.2b . 
this equation is not appropriate, because the Faraday rotation 
term F p (placed at the right-hand side of the chain) does not 
commute, and so would necessitate an extra term in Eq. ©. 

1.4. The impact of the RIME on calibration 

The reasoning used above to construct phenomenological 
RIMEs illustrates one of the biggest benefits that the RIME for- 
malism has brought to the field of calibration. Pre-RIME, de- 
scriptions of signal propagation effects were ad hoc and approx- 
imate, while arguments about the order in which they should be 
calibrated for were difficult to follow. The RIME formalism has 
recast all this in terms of straightforward and rigorous matrix 
algebra. 

The second benefit of the RIME formalism is the clarity it 
has brought to polarization calibration. Note that the implicit 
NEWSTAR RIME given above (Eq. [3} ignores polarization ef- 
fects almost completely. NEWSTAR does have some polariza- 
tion calibration capabilities (as do other 2GC packages), but 

2 In the absence of DDEs. 



these are specifically tuned to the WSRT case. The RIME for- 
malism allows for a much more general description of polariza- 
tion effects. The D and P terms of the CASA RIME (Eq.O are 
an example, but see also the discussion of differential Faraday 
rotation in Sect. I2.2.2I 

Perhaps most importantly, the RIME gives us the mathemat- 
ical language to tackle the problem of DDEs, which will be the 
subject of the next section. 

1.5. Calibration ambiguities 

No discussion of calibration with the RIME can be com- 
plete without mention ing the ambiguity problem pointed out by 
lHamakerl(l2QO0Ll2OQ6h . In classical selfcal, there is a well-known 
flux and position ambiguity: multiplying all the antenna gains 
by a complex factor a, and the source coherency by cT 2 , does 
not change the observed visibilities. Therefore, selfcal by itself 
cannot determine absolute fluxes and positions - these require 
known calibrators. There is a full-polarization equivalent to this, 
but it is extremely difficult to formulate and understand outside 
the RIME formalism. 

For a direct analogue, consider a RIME such as that in 
Eq. ©. For any non-singular matrix A, we have: 

M pq = G p X pq G H q = {G p A)(A- x X pq A H - l )(G q A) H 

In other words, we can multiply all the per-antenna uv-Jones 
terms by A, and the source coherency by A~' and A H ~', without 
changing the observed visibilities. Therefore, we need known 
calibrators to properly fix the G p 's. Having observed a calibrator 
source, we can fix the brightness B (and therefore the coherency 
X P9 = K p BK q ), and solve for G p . However, it is easy to see that 
an unpolarized calibrator alone is insufficient. The brightness 
(and coherency) matrix of an unpolarized source is scalar, so 
for any unitaryQ matrix U, we have UX pq U H - UU H X pq = X pq , 
or: 

V„ = G p X pq G H q = (G p V)X pq {G q Vf. 

Thus, given a known but unpolarized sky, we can only de- 
termine G p to within an arbitrary unitary ambiguity factor U. 
In other words, we cannot fix the polarization response of our 
system without polarized calibrators. A physical example of 
such an ambiguity is rotation of all dipoles by the same angle: 
this cannot be detected through observations of an unpolarized 
source. 

As it turns out, even a polarized calibrator alone is insuffi- 
cient, though the matrix algebra gets a bit complicated at this 
point. The B matrix is Hermitian positive-definite by construc- 
tion, and has a Cholesky decomposition^ i.e. there exists a 
lower-triangular L such that LL H = B. For any unitary U, we 
then have: 

(LUL- l )B(LUL-y = L[/(L _1 L)(L H L H_I )[/ H L H = LL H = B. 

Therefore, given a single polarized calibrator, we still have 
an ambiguity factor of Ll/L^ 1 ! Physical examples of this are 

3 U is unitary if UU" = 1. 

4 A Hermitian matrix P is positive-definite if z H Pz > for all 
non-zero complex ve ctors z. That B is positive-definite follows from 
Sylvester's criterion (Gilbert 1991), because I + Q > and detB = 
I 2 - Q 2 - U 2 - V 2 > 0. In fact, the Cholesky decomposition for B can 

be worked out directly: L = ( (f/ _ f^j^Q ^-q ) ■ 
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somewhat more elaborate, but perhaps the simplest one is that 
a source with Q polarization only is insensitive to a certain 
combination of dipole rotation and gain adjustment. Indeed, for 

/ o e /-e)' wehaveL = ( V7 o^ 



B = 



given a rotational U, the resulting ambiguity factor is 



and, 




sin< 



L Rot(0) L 



The upshot of this is that unambiguous calibration of the po- 
larization response of an interferometer requires multiple polar- 
ized calibrator sources, and/or additional assumptions about the 
sky (e.g. V _ 0, which wa s a common assumption in the pre- 
RIME era) . lHamakerl d2006l) explores these issues in more detail. 

We should note that though the matrix equations above may 
seem somewhat complicated, they are far more succinct and 
complete than any scalar equations that have been used to de- 
scribe polarization calibration prior to the RIME. Once again, 
the RIME provides a rigorous mathematical language to describe 
what is otherwise an extremely tricky problem. 

2. Direction-dependent effects (DDEs) 

Most of the problems associated with non-trivial DDEs are al- 
ready pointed to by Eq. (Q~|i. The fundamental assumption of tra- 
ditional selfcal is that DDEs are trivial, meaning that: 

- Each observed visibility V pq is a measurement of the 
sky coherency function X(m) at point u pq , corrupted by 
some combination of multiplicative (per-antenna or per- 
interferometer) gain terms. 

- The coherency function X(m) is a Fourier transform of the 
apparent sky B app (/) (see also Eq.|2]i. 

DDEs are a multiplication in the Im plane, which corre- 
sponds to a convolution in its Fourier counterpart, the uv plane. 
That is, in the presence of non-trivial DDEs E p (l) (including a 
non-trivial W p term), the observed visibility is actually a convo- 
lution of the sky coherency. Assuming G p = 1 for the moment, 
Eq. ([TJ then gives us: 



•pq 



X p q(u p q) 



X pq = U p o X o Lf 



(7) 



where "o" is a matrix convolution (i.e. following the same 
rules as matrix multiplication, with each elementary multiplica- 
tion replaced by a convolution), and the convolution kernels U p 
are Fourier transforms of the sky-Jones terms E p . We can rewrite 
this equation to emphasize the time variability, and the fact that 
any given interferometer pq only samples one point u pq of the 
uv plane at a time: 



V M (0 = X pg [t](U pg (t)), 

X pq [t] = U p [t]oXoU^[t], 
X = TB, U p [t] = TE p [t] 



(8) 



This equation captures the heart of the DDE problem: DDEs 
convolve the "ideal" visibilities, with (in the general case) a dif- 
ferent kernel per every antenna and time sample. Instead of sam- 
pling one uv plane (X), we have a separate uv plane per each pq 



and time interval (X pq [t]), and we're sampling each such plane 
at only one (or at most a handful) of points. Convolution is not 
uniquely reversible at the best of times; with such limited sam- 
pling it is even less tractable. This is the reason why in the pres- 
ence of DDEs, corrected visibilities (in the sense of Sect. ll. ll > do 
not exist. To be more precise, they may exist in the mathematical 
sense, but recovering them is an inverse (and ill-posed) problem. 

In this section, I will first consider the two common sources 
of DDEs: the ionosphere and the primary beam, and then discuss 
some proposed methods of dealing with them. 

2.1. E-Jones: beam-related DDEs 

The primary beam gain, commonly designated as the E- Jones, is 
the single most ubiquitous DDE (since every telescope, after all, 
has a beamshape of some kind), and probably the most problem- 
atic0 The implicit simplifying assumption of 2GC packages is 
that the interferometer observes an "apparent sky": that is, some 
true sky B(l, m), attenuated by a power beam \E{1, m)\ 2 . Given a 
reasonably accurate model for the beam, the final images can be 
multiplied by \E(l, m)\~ 2 to correct the flux scale (at the cost of 
increasing the image noise away from centre). 

In RIME terms, this classical assumption corresponds to an 
E-Jones that is a trivial DDE (i.e. constant in time, and same 
across all stations), but also the same for both receivers and thus 
scalar: E p (t, I, m) = E(l, m). We can then commute the E term 
in the apparent sky equation ([]]), which becomes a simple mul- 
tiplication of the true sky B by EE H = \E\ 2 . (Incidentally, this 
also shows why classical selfcal does not concern itself with the 
complex phase of the primary beam.) 

Real-life beams deviate from these assumptions in a number 
of ways, some of them less well understood than others. 

2.1 .1 . The WSRT and VLA £-Jones 

The WSRT primary beam is commonly approximated as: 



E(l,m) = cos 3 (CvV/ 2 +m 2 ), 

where C has a very mild dependence on v (i.e. is effec- 
tively constant for a given band). This mode l is only valid for 
the m ain lobe, down to about the 10% level. Poppin g & Braunl 
(2008) have made a detailed empirical study of the WSRT pri- 
mary beam, which shows significant four-fold symmetric struc- 
ture out in the sidelobes (caused by the feed legs). More signifi- 
cantly, they have shown a quasi-periodic "ripple" in the off-axis 
beam gain as a function of frequency, with a period of ~ 17 
MHz. This is commonly seen in the observed spectra of off-axis 
sources. 

Similarly to the WSRT cos 3 model, the VLA primary 
beam has a reasonable analytic approximation using Jinc func- 
tions, which is valid to about t h e 5% level of the main lobe 
(lUson & Cotton! l2008h . iBriskenl (120031) has made electromag- 
netic simulations that show the sidelobe structure. What sig- 
nificantly complicates the VLA case is beam squint (the beam 
pattern of the R and L receptors being offset w.r.t. the pointing 
centre due to the feeds being off-axis), and parallactic angle ro- 
tation. 



5 In the general formulations above, I used E to refer to all DDEs 
in the signal path. At the risk of confusion, this section will also use 
E for the beam-related Jones term in particular. The ubiquitous nature 
of beamshapes, and the problems associated with them, is perhaps a 
justification for using "E" as the "representative" DDE letter. 
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2.1.2. Parallactic angle rotation 

An alt-az mount telesc ope, without a dish d erotator such as that 
designed into ASKAP (John ston et alj20 08). has an intrinsically 
time-variable beamshape in the Im frame, as the nominal beam 
pattern rotates with parallactic angle. Like any DDE, this causes 
significant spatial artefacts around off-axis sources that cannot 
be addressed by classical selfcal. This has been a serious dy- 
namic range limitation at the VLA, bu t some recent deyelop- 
ments promise to alleviate the problem. Uson & Cotton! (2008) 
describe a CLEAN-like algorithm (implemented in the Obit 
package) that corrects these artefacts du ring deconvolution; the 
RIME-derived AW-projection method of Bhatna gar et all (2008) 
can correct them during imaging. Note that both methods rely on 
an a priori beam model, and have, to date, been only been ap- 
plied to VLA data, for which the Brisken simulations provide 
a very detailed beam model. It remains to be seen whether the 
more approximate models available for other instruments will 
prove to be a limiting factor. 

The WSRT's equatorial mounts (and ASKAP's derotator) 
keep the beamshape stationary in the Im frame, thus avoiding 
this problem entirely. 

A particularly troublesome situation arises when a suffi- 
ciently bright source is located in a sidelobe or near a null, where 
sky rotation causes rapid variation in the beam gain, and the ac- 
curacy of existing beam models is low. Such sources have to 
be calibrated and subtracted separately, either via some kind of 
peeling procedure, or by using the differential gain approach de- 
scribed in Sect. l2.4.3l Even at the WSRT, where rotation is not an 
issue and the beam gain remains (at least in principle) constant 
in time, sources in a sidelobe need to be treated very carefully, 
due to the rapid spectral variation caused by the 17 MHz ripple. 



2.1.3. Instrumental polarization 

Instrumental polarization comes about due to the beam patterns 
of the two receptors being non-identical. In RIME terms, this 
corresponds to /s-Jones being diagonal rather than simply scalar: 



E(l,m) = 



e x (l,m) 
e y (l, m) 



which causes an unpolarized off-axis source to "acquire" 
some Q (or V, if using circular receptors): 







J app 







1 
1 



e x 
e y 



e x \ 2 

o k,.F 



The WSRT case is rather simple: the beamshape of each 
dipole is slightly elongated rather than circularly symmetric. 
Since these beamshapes are stationary w.r.t. the sky, the net re- 
sult is an "apparent sky" with a non-uniform polarization re- 
sponse: 



Barm — 



\e x \ 2 (I + Q) e x e;(U + iV) 
e x e*JU~iV) k/(/-G) 



Similarly to power beam attenuation, this effect can be re- 
moved (to the extent that the primary beam is known) via a linear 
correction to the final images. 

For the VLA, non-identical receptor beams are caused by the 
aforementioned squint; the squint offset rotates with parallactic 
angle (and thus as a function of time). This leads to a rather com- 
plicated picture of instrumental polarization, but is essentially 



the same problem (with the same solutions) as primary beam 
rotation. 

Note that in co ntrast to the the WSRT case, the simulations 
of lBriskenl (2003) show that the VLA E-Jones has non-trivial 
elements on the off-diagonal. This is an example of direction- 
dependent polarization leakage. Leakage has been commonly 
associated with slight errors in dipole orientation, electromag- 
netic cross- talk, etc., and treated as a direction-independent ef- 
fect dHamakeretalJU996t iNoordaml [l996l) : Brisken's results 
demonstrate that it is actually a DDE. 

Finally, it sho uld be mentioned t hat the polarization aberra- 
tion described by ICarozzi &Woanl (2009) can also be treated 
as direction-de pendent instrumental polarization (see Paper I, 
ISmirnovl20TTaL Sect. 5.4). 

The RIME makes it explicit that effects as (variable) pri- 
mary beam attenuation, instrumental polarization, and leakage, 
which are treated separately (if at all) in 2GC, can in fact be 
represented by a single Jones term, and treated via a single 
mechanism. Perhaps the most stark example of this i s provided 
by ap erture array beams, such as those of LOFAR ( Yat awattal 
2008). With the dipoles of an aperture array fixed on the ground, 
E(l,m) towards any specific sky direction exhibits complex 
time-dependent behaviour in all four matrix elements. This com- 
pletely blends the boundary between primary beams, leakage 
and instrumental polarization. 



2.1.4. Pointing errors & dish deformation 

All telescopes mispoint to some extent. This is caused by grav- 
itational load, thermal expansion, wind pressure, errors in the 
drive mechanics or even the control software, etc. In RIME 
terms, this can be represented by a station-dependent offset in 
the beam pattern, causing a nominally identical beamshape E to 
produce a different response per station: 



E p (l, m) = E(l + Sip, m + 5m p ) 



(9) 



The offset 6l p ,6m p is, in general, time-variable. Since the 
effect of mispointing on observed visibilities is roughly propor- 
tional to dE/dl and dE/dm, it is lowest at the centre of the beam 
(where the beamshape is flat), and highest on the flank of the 
beam and around the nulls. Classical selfcal tends to "absorb" 
the effect of mispointing in the direction of the dominant source 
into the per-station amplitude gain solutions. 

Mispointing is thought to be a major source of off-axis er- 
rors in WSRT and VLA maps , and t hus has been the subject of 
many studies. Bhat nagar et al.l d2004l) proposes a modification to 
the selfcal algorithm called pointing selfcal, which consists of 
solving for the 6l p , 6m p parameters during selfcal. This is predi- 
cated on having accurate models for both E(l, m) and the off-axis 
sources, and sufficient SNR to constrain the solution. Pointing 
selfcal has been shown to work with simulated data, and recently 
with real VLA observations (Bhatnagar, priv. comm.) Paper III 
( Smi rnovll201 lbl) will discuss a different approach to the point- 
ing problem. 

The environmental factors responsible for mispointing can 
also cause deformation of the dish surface. The resulting changes 
to E(l, m) are rather more difficult to p redict and quantif y, and 
little work has been done on the subject. lHarp et al.l (|2010) show 
significant thermal-related deformations at the Allen Telescope 
Array (ATA). 
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2.2. Ionosphere & troposphere 

The ionosphere becomes a particularly troublesome DDE at low 
frequencies, owing to the oc v _1 behaviour of ionospheric phase 
delay, and oc y~ 2 behaviour of Faraday rotation. For a more de- 
tailed loo k at the ionosph e re and its effects on sign al propaga- 
tion, s ee [Thompson et ap d200lL Sect. 13.3) and llntema et al.l 
(120091) . Below I will briefly summarize ionospheric effects in 
terms of the RIME. 

2.2.1. Ionospheric phase 

Ionospheric phase delay is caused by excess pathlength due to 
refraction. In the RIME formalism, it corresponds to a scalar 
Jones term: Z p = e'^, where £ p oc TV -1 , and T is the Total 
Electron Content along the line-of-sight. Phase delay can easily 
reach 10 2 — 10 4 rad at lower frequencies, with variations on rela- 
tively short timescales and sm all spatial scales, thus making for a 
rather severe DDE. FollowingEonidali §005), we can identify 
distinct observational regimes based on the size of the array (A), 
the projected size of the FoV (V), and the scale structure of the 
ionosphere (S), i.e. the spatial scale on which ionospheric phase 
is approximately linear. The first criterion is FoV size: 

Narrow FoV: V <k S , making ionospheric phase effectively 
constant over the FoV (Z p (l) = Z p (Q)), and thus a DIE. 
Since Z p is scalar, it can be commuted to any position in 
the RIME and absorbed into another Jones term, such as the 
per-antenna complex gain that is solved for during regular 
selfcal. 

Wide FoV: V > S , and therefore Z p is properly direction- 
dependent. 

The second criterion is array size: 

Tiny array: A « X. Ionospheric phase is constant on scales of 
A, thus Z p = Z q for all p, q. This makes Z p Z q — 1, so the 
interferometer does not "see" the ionosphere at all. 

Compact array: A w S . Ionospheric phase is approximately lin- 
ear on scales of A. Crucially, this means that for any direc- 
tion / and baseline pq, the observed phase difference £ p - £ q 
is proportional to the projection of the baseline u onto the 
ionospheric screen, and thus: 

Z p Z^ e^+^M (10 ) 

Extended array: A > S . Different stations of the array are look- 
ing through completely different parts of the ionosphere. 

The tiny array case is trivial and not considered further. 
Lonsdale regimes 1 and 2 correspond to narrow FoVs with com- 
pact or extended arrays: these can be dealt with using regular 
selfcal. In regime 3 (wide FoV, compact array), the ionosphere 
manifests itself as an apparent "distortion" of the field: each 
source is shifted by its own (time-variable) offset rj, This can 
be easily seen by inserting the Z- Jones given by Eq. ( TTOb into 
the full-sky RIME of Eq. (Q]i, and merging it with the complex 
exponent. 

Finally, Lonsdale regime 4 corresponds to an extended array 
and wide FoV. This is the regime in which MWA and LOFAR 
are expected to operate. Z p Z q then results in a baseline- and 
direction-dependent phase offset, which causes each source in 
the field to be "smeared" with a different PSF Selfcal tends to 
take care of the offset towards the dominant source, thus produc- 
ing an image which is adequate in the vicinity of the dominant 
source, but gets increasingly distorted away from it. 



2.2.2. Faraday rotation 

Faraday rotation (FR) is rotation of the EM field vector that oc- 
curs during propagation through a medium of free electrons in 
the presence of a magnetic field. In RIME terms (and assuming 
a linear-polarization coordinate basis), the corresponding Jones 
term is a rotation matrix: 

where "LoS" stands for line-of-sight, B\\ is the component 
of the magnetic field parallel to the LoS, and n e is the electron 
density. In a ci rcular-polarization coordinate basis (see Paper I, 
ISmirnovl2011al Sect. 6.3), F becomes a differential phase delay 
of the left- and right-polarized components: 

F Q = HFH 1 = J e Q £ % J 

The obvious observational effect of FR is a frequency- 
dependent rotation of the angle of polarization. FR associated 
with the interstellar medium can, for purposes of calibration, be 
considered an intrinsic property of the sky per se. Because of 
the V 1 behaviour, ionospheric FR at higher frequencies is prac- 
tically negligible. For all these reasons, FR has been an obscure 
effect, largely ignored outside of the field of polarimetry. 

This has changed with the advent of large low-frequency ar- 
rays such as LOFAR. In 2010, the first LOFAR long baseline 
(Effelsberg-Exloo) detected a strange effect: at certain frequen- 
cies, an unpolarized source was showing significant si gnal in the 
XY/Y X correlations, and practically none in XXI YY dWucknitzl 
2010). After considerable excitement, this was linked to differ- 
ential FR (DFR). This effect is an excellent example of the ex- 
planatory power of the RIME formalism, so it is worth consid- 
ering in some detail. At low frequencies, ionospheric FR can 
be as high as several cycles (e.g. 15 cycles at 100 MHz, see 
IThompson et al.ll2001l Sect. 10.3) so the DFR between two sta- 
tions of a long baseline can reach significant fractions of a cycle. 
Consider what happens when an unpolarized 1 Jy point source 
at phase centre (K p = 0) is subject to an FR of n j2\+2nn\ on sta- 
tion p, and 0[+27rn] on station q. In the absence of other effects, 
the measured visibility will be: 

v „. w ». (?-„')(; ?)(;?H?- '), a 2 , 

or in other words, all the original / flux will be detected as V! 
This clearly shows that DFR is not only a polarimetric concern, 
but is a mainstream calibration problem. 

Perhaps the most striking feature of Eq. (TTZl i is how it de- 
scribes a complicated physical effect with very trivial mathe- 
matics. This is a great example of the simplicity brought by the 
2x2 formalism. Int erestingly, this very effect was predicted by 
Hamaker et al ] (119961) in the original RIME paper, but (perhaps 
owing to the relative opacity of the 4x4 Mueller formalism, with 
which it was described) was not immediately recalled when ac- 
tual DFR was detectecQ. 

6 According to James Anderson (priv. comm.), the VLBI community 
was aware of the implications of DFR during the 1970s, and this was a 
major reason for choosing circularly polarized receptors. Recall that in 
the circular polarization frame, DFR (or indeed any geometric rotation) 
becomes a simple phase effect, and can be subsumed into the overall 
phase calibration. I haven't been able to locate a citation for this. There 
are other compelling reasons for using circular receptors in VLBI: par- 
allactic rotation being easier to deal with is one of them. 



O.M. Smirnov: Revisiting the RIME. II. Calibration and DDEs. 



7 



2.2.3. Refraction, curvature and absorption 

Ionospheric absorption is a relativel y small amplitude effect (e.g. 
0.1 dB at 100 MHz andZA=60°, see lThompsone t al. 2001), and 
is mostly subsumed by the overall gain calibration. Differential 
absorption makes for a non-trivial DDE, but this is tiny. 

Ionospheric refraction causes an apparent shift of position of 
the source within the primary beam. This can be on the order 
of 0.05° (at 100 MHz and ZA=60°). The corresponding change 
in primary beam gain can be a significant effect, but is probably 
not in excess of that caused by uncertainties in the primary beam 
pattern itself. It can therefore be absorbed by whatever primary 
beam calibration scheme is applied to the data. 

Finally, Anderson (priv. comm.) has pointed out that refrac- 
tion through a curved ionosphere should produce a phase DDE, 
due to the fact that the apparent baseline (i.e. the baseline as seen 
by the refracted wavefront) changes. The Anderson effect should 
be detectable on LOFAR's long baselines, but it is not yet clear 
whether it can be separated from Z- Jones per se. 

2.2.4. The troposphere 

The troposphere adds its own phase delay, with a roughly oc y 
behaviour. Because most of the effect actually happens very 
close to the ground, tropospheric phase delay T p is essentially 
a Regime 2 effect (i.e. a DIE), and can be subsumed into the 
overall complex gain calibration^ 

Tropospheric refraction can be significant at low elevations 
(iThompson et al.ll200lL Sect 10.1), so telescopes incorporate a 
pointing correction to account for it. Differential tropospheric 
refraction (DTR), caused by the curvature of the Earth (i.e. by 
different antennas "seeing" a source at slightly different eleva- 
tions) should cause a very small DDE. There are hints of this 
in high-dynamic-range WSRT data (de Bruyn priv. comm.), but 
more work is required to confirm detection of this. Likewise, an 
analogue of the Anderson effect should also apply to the tropo- 
sphere, but it is not clear whether this can be detected. 

2.3. Correcting for known DDEs 

Even when a (non-trivial) DDE is known (whether a priori 
or from calibration), correcting for it is a non-trivial problem. 
Several approaches to this have been proposed. 

2.3.1. Facet imaging 

If a DDE is known (and constant in time), it may be trivially cor- 
rected for in a single direction /o by applying the inverse of the 
Jones term E p (Iq). For example, given the observed visibilities 
in Eq. we can apply correction factors of E p l (lo)G p 1 and 
{E q l {lQ)G q l ) H . The resulting visibilities will then be given by: 

Mf q = T(E p BE q ), 

where E p (l) = E p (l)E- p \lo). 

We can then use standard imaging techniques (i.e. the inverse 
Fourier transform) to compute B (0) = T^M^. Since E p — > 1 
with / — > /o, the resulting image is equal to the "true" sky at Iq 
(B (0) (/o) = B(/o)), and diverges from it as we get away from Iq. 

1 Because of the oc y behaviour, this is not necessarily true at sub-mm 
frequencies. The Atacama Large Millimetre Array (ALMA) will rely on 
water-vapour radiometers for proper tropospheric phase calibration. 



This is the essence of the facet (or polyhedron) imaging tech- 
nique pioneered by Cott on and Schwab (for an overview, see 
ICornwell & Perlevlll992l) . The direction Iq corresponds to the 
center of a facet. By imaging many small facets (each with its 
own correction factor), and stitching the resulting images to- 
gether, we can approximate the "true" sky to arbitrary precision 
(by making the facets suitably small). Facet imaging is available 
in many 2GC packages, and is well-tested and understood. Its 
major drawback is the high computing cost (when many facets 
are involved), and the fact that time variability in E p cannot be 
taken into account. 

2.3.2. AW-projection 

A far more promising alternative is suggested by convolutional 
function approaches. The first of these was the W-projection al- 
gorithm proposed by ICornwell et alj d2008l) . which corrects for 
the W p term on-the-fly during imaging. This is now routinely 
available in the CASA imager (and also the in lwimager tool 
of the casarest pack age, which shares the same codebase). 
Bhatnaga r et alJ (1200 8) have generalized this approach to arbi- 
trary DDEs. The resulting AW-projection algorithm has been 
tested in an experimental version of CASA, and it is planned 
to make it available in future releases (Bhatnagar priv. comm.) 

The crucial insight underlying the AW-projection algorithm 
is that a convolution such as Eq. ([8]l can be efficiently com- 
puted both in the forward direction, during the degridding step 
(when predicting visibilities from an image), or in the reverse 
direction, when gridding visibilities for imaging, on the condi- 
tion that U p has limited support (i.e. is significantly non-zero 
only within a limited area around the origin), which is the same 
thing as E p being sufficiently smooth. If we further assume E p 
to be (approximately) unitary (i.e. E p Ep « 1), thenEq. ([8]) may 
even be (approximately) inverted by computing the convolution 
Up [t\°V pq°U q [t\. There is a fixed computational cost associated 
with the extra convolution kernels, but it scales to wider fields a 
lot more favourably than the facet imaging approach. 

In other words, AW-projection provides an accurate method 
to apply known DDEs in the forward direction (i.e. when pre- 
dicting visibilities from a model image), and an approximate 
method to correct for them in the reverse direction (when imag- 
ing). 

While W-projection has been in use for a while and is well- 
tested, the limitations of the more general AW-projection method 
are still poorly understood. In particular, it is not clear how (or 
whether) dynamic range is limited by (a) non-unitarity of E p , 
and (b) the fact that high-order terms in U p are ignored (i.e. the 
limited support assumption). No doubt this understanding will 
improve as implementations of the algorithm become widely 
available to the community. 

2.3.3. Subtraction in the Mv-plane 

Given a known sky model, the most straightforward way of deal- 
ing with a known DDE is to directly evaluate Eq. (Q~|) in the 
forward direction, and subtract it from the observed visibilities. 
This gives us the residuals R pq = D pq - V pq , which can then be 
corrected for the DIEs. Once imaged, they will still be subject to 
DDEs on the same relative level. However, if a significant por- 
tion of the flux is accounted for by the sky model, then the abso- 
lute level of DDE-related artefacts will be much lower, perhaps 
even below thermal noise (if the sky model is sufficiently deep 
- and a sufficiently deep model is a requirement for calibration 
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anyway). The sky model itself can be added ("restored") directly 
into the residual images wit h no error. This m ethod was used for 
the reduction of Paper III dSmirnovll201 lbl) . and produced the 
"showcase" image of Fig. 1 therein. 

For a sky model composed of discrete source components, 
this is also called the DFT approach, since evaluating Eq. ([TJ 
on a per-source basis is equivalent to doing a brute-strength 
Discrete Fourier Transform. There has been considerable debate 
in the literature and at meetings about the relative merits of the 
DFT approach vs. FFT-based methods such as AW-projection. 
DFTs have the advantage of maximum precision (at least to the 
extent that the DDE is known), but are very expensive compu- 
tationally, since they scale linearly with the number of sources 
being modelled. AW-projection is approximate (see above), but 
its computational cost scales much better, as it only depends on 
resolution. 

It should be made clear that the two approaches are comple- 
mentary rather than mutually exclusive, and can be favourably 
combined (provided compatible implementations are available, 
which is a matter of some urgency), by using DFTs for the 
brighter sources in the field, and AW-projection for the fainter 
ones. By choosing a flux threshold, one can then achieve a clear 
trade-off between accuracy (and, ultimately, dynamic range) and 
computational cost. 

2.4. Calibrating the unknown DDEs 
2.4.1. Selfcal contamination 

None of the 2GC packages provide any explicit capabilities 
for calibration of the unknown DDEs, since they all assume 
an implicit RIME similar to Eq. (0, with a single direction- 
independent gain term. Consider a very simplified picture, with 
a field consisting of only two discrete point sources with bright- 
nesses of Bo and Bi, and assume DIEs of unity. The observed 
visibil ities D pq are then given by Eq. (15) of Paper I dSmirnovl 



l20TTah . with G p = 1 



(13) 



Dp,? = EopXopqEo q + E\ p X\ pq E^ q + N, 

where X spq = K sp B s K" 

and N is a 2 x 2 matrix of Gaussian noise. Traditional selfcal 
(assuming a perfectly known sky model) then attempts to fit D pq 
with the following RIME: 



•pq 



Gp(Xo pq + X 



Hpq 



(14) 



in a least-squares sense, over all baselines pq. Obviously, the 
best-fitting G p — > Eq p as Bi — > 0. On the other hand, if Bi ^ Bo, 
G p will be some kind of average between Eq p and E\ p . Because 
of the complex phase behaviour in the K terms, this is difficult to 
analyse in detail. To get a qualitative picture, let us consider the 
scalar case. Assume that E p is scalar and purely real, and that the 
sources are unpolarized, so B s is scalar as well: B s = I s . We can 
see that the biggest discrepancies (in amplitude) occur when the 
phases of the additive terms in Eq. ( fT3l add either constructively 
or destructively. In these two cases, we get: 



E\pE\q 

|D P? | = EopEoqiJo + — — — 

^Op^Oq 

|V W | = \G p \\G q \(I ±h) 



h) 



For any non-trivial array configuration, each baseline has a 
different fringe rate, so at any point in time some baselines will 
be closer to constructive addition, and others will be close to de- 
structive addition. Therefore, no set of G p can achieve a perfect 
fit of D pq to V pq . However, from the above we can infer an upper 
bound on the relative error of the fit: 



1 - So,i < 



|V P 



< 1 + S 0) i 
E\ p 



(15) 



co l = max 

p 



Eq p 



h 



I shall call Ho,i the selfcal contamination factor [of source 1 
into source 0] . I do not have a formal proof for a lower bound- 
ary on the error terms in Eq. ( fT5l ), but extensive simulations with 
MeqTrees suggest that it is also proportional to Hoj. We can 
therefore summarize these considerations as follows: in the pres- 
ence of DDEs, traditional selfcal will tend to subsume the DDEs 
in the direction of the dominant source into its selfcal gain solu- 
tions; the fitted visibilities will be subject to contamination from 
the unmodelled DDEs towards the next-brightest source, with a 
relative error proportional to Hoj. 

Similar considerations apply to any discrepancies (i.e. miss- 
ing sources, etc.) in the sky model. Ultimately, selfcal contam- 
ination makes itself felt via artefacts in the resulting images, 
which can be extraordinarily complicate d and counter-in tuitive 
(for an example, see Fig. 17 of Paper III, Smirno \l2011bh . 

2.4.2. Peeling 

The p eeling algorithm was originally proposed by iNoo rdam 
(2004) as a way of calibrating and removing DDEs from bright 
sources one by one, in order of decreasing brightness. Since its 
introduction, the term "peeling" has been misunderstood and di- 
luted to the point where it is occasionally used to describe any 
technique incorporating direction-dependent solutions, but this 
is incorrect. In its original formulation, peeling refers to a very 
specific calibration algorithm: 

1. A normal selfcal solution is performed, using an equation 
such as ([3]). The resulting G p solutions will tend to incorpo- 
rate DDEs in the direction of the brightest source so- 

2. The prediction for so is subtracted from the data. This is the 
"peeling" step per se: our best estimate for the visibility con- 
tribution of so is, in a sense, peeled away. 



D (D 

pi 



'pq 



H 

p'^Sapq^q 



' G„Xt„, M G. 



3. Optionally, the Dpi visibilities are corrected by applying 



G: 



4. Optionally, the D p q visibilities are phase-shifted to the po- 
sition of the next-brightest source si and averaged down in 
time and frequency (to smear out the contribution of other 
sources). 

5. The visibilities are presumably dominated by source s\ . 
We now go back and repeat the procedure for si . 

Peeling has the considerable advantage that all existing 2GC 
calibration packages provide sufficient functionality to imple- 
ment its steps, so it has been widely tested and accepted in the 
community. 
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The major drawback of peeling is that it can be very expen- 
sive computationally. Note that the solutions at step 1 are subject 
to selfcal contamination H Jo Jl . This error is "frozen in" at step 
2, when the fitted visibilities (for source so) are subtracted from 
the data. It can then further contaminate the solutions for s\ (in 
addition to the contamination S SuS2 . If the source being peeled is 
truly dominant, then this contamination can be negligible, but if 
the brightness of so an d s i 1S comparable, it can become pretty 
severe. These errors can be driven down by repeated iterations 
through the peeling cycle (with clever subtraction of sources), at 
the cost of significant CPU and I/O overhead. This makes peel- 
ing impractical when dealing with more than just a few sources. 



2.4.3. Differential gains 

The differential gains approach is closely related to peeling. It 
may be thought of as a generalized, simultaneous form of peel- 
ing. A det ailed practical example will be discussed in Paper III 
but the essence is to use a RIME of the form: 



V pq - G p 



> AE SD X SDO AEf 



(16) 



and solve for G p on small time/frequency scales (as per 
normal selfcal), then simultaneously solve for AE ps on larger 
time/frequency scales, for a subset of fainter sources. The G p so- 
lutions then subsume all DDEs in the direction of the dominant 
source, while the AE ps terms account for the difference towards 
the fainter sources. If some of the DDEs are known a priori, 
suitable terms for them can be inserted into the equation above 
in addition to AE ps . The differential gain solution will then ac- 
count only for the remaining unknown DDEs. 

Note that solving for AE on a single off-axis source is equiv- 
alent to peeling the dominant source and solving for the off-axis 
source (with suitable solution intervals chosen for each self- 
cal step). The AE approach overcomes a lot of the drawbacks 
of peeling (contamination of solutions and frozen-in errors, the 
need for repeated selfcal cycles) by doing a single simultaneous 
solution in one step. 

Differential gains share a common weakness with peeling: 
that of proliferation of degrees of freedom (DoF's). This is par- 
tially mitigated by using larger solution intervals, but it is ob- 
vious that we cannot simultaneously solve for AE ps towards all 
sources in a typical field, since that would be gross over-fitting. 
(Not to mention the CPU cost of solving for that many param- 
eters simultaneously, which would probably become prohibitive 
first.) 

2.4.4. Parametrized models and beacon sources 

The DoF issue can be addressed if the DDE in question can be 
represented by a parametrized model for E p . We can then solve 
for the parameters of that model (presumably, few in number), 
and then correct for the resulting E p estimate using one of the 
methods of Sect. 12.31 

A number of approaches have shown that this is feasible. 
For the ionosphere, the field-based calibration (FBC) method of 
ICotton et al.l (120041) uses the position offsets of sources (in in- 
dividual snapshot images) to fit a global phase screen over the 
array. The s ource peeling and a tmospheric modelling (SPAM) 
algorithm of llntema et al.l (120091) does a similar fit to phase so- 
lutions obtained via peeling (in AIPS). Both methods show how 
to work around the limitations of 2GC packages: since direct 



fits to visibilities are impossible in the framework of the latter, 
especially without a fully-fledged RIME, they rely on standard 
calibration methods (including peelin g), and fit a model to the re- 
sults of calibration. Hull et all (1201 Oh have demonstrated a sim- 
ilar approach for Zs-Jones, using source fluxes to fit the FWHM 
parameter of the ATA beam. 

Given an explicit RIME, it should be possible to fit 
parametrized models directly to the observed visibilities. The 
minimum ionospheric model (MIM) approach proposed by 
Noordam is similar to FBC and SPAM, in that it purports to 
fit a smooth model for ionospheric phase, but is different in 
that it uses visibilities (but also other sources of data, such as 
GPS measurements). This requires a software system where ex- 
plicit RIMEs may be implemented, and so cannot be adapted to 
2GC packages, but it has been demonstrated in the LOFAR BBS 
system, using a simple linear -slope MIM. The pointing selfcal 
method (Bhatnagar et al. 2004) already mentioned above is an 
application of the same approach to pointing errors. 

All these methods have the common feature of relying on 
beacon sources, that is, having enough sources in the field to 
constrain the solutions. The availability of a sufficient number 
of beacons is a crucial question for the calibratability of future 
instruments. I wi ll return to this in the conclusion to Paper III 
(ISmirnovll201 lbl) . after the results presented therein have been 
considered. 

Note that, just as in the DFT-vs.-FFT debate discussed 
in Sect. 12.3.31 there is a related dichotomy between the 
parametrized model approach, and methods based on direction- 
dependent solutions (peeling, differential gains). The latter 
methods require the use of DFTs at the predict stage, since 
the FFT approach (AW-projection) cannot be applied without a 
model of E p (l) for the entire field. Parametrized models, on the 
other hand, may be applied both via DFT and FFT. 

Once again, I suggest that the two approaches should be 
treated as comp lementary. Looking ahead, the results of Paper III 
(ISmirnovl201 lb ) will show that brighter off-axis sources exhibit 
all sorts of complicated structure in their AE p solutions, even 
in the relatively uncomplicated (i.e. low-DDE) case of WSRT 
21 cm observations. It is hard to see how this can be captured 
by a parametrized DDE model to a precision sufficient for error- 
free subtraction of such sources. This suggests a similar trade-off 
in accuracy vs. computing cost as that described in Sect. 12.3.31 
leading to the following hybrid approach for dealing with DDEs: 

1. The unknown DDEs are calibrated for via parametrized 
model(s), which [hopefully] accounts for the bulk of the ef- 
fect. 

2. In addition, AE p solutions are obtained for the brighter off- 
axis sources, to account for any deviations from the sky or 
DDE models towards those sources. 

3. The brightest sources are predicted and subtracted via DFT. 

4. Fainter sources are predicted and subtracted via FFT. 

5. The residuals are corrected for during imaging using AW- 
projection. 

Note that the sets of sources involved at steps 2, 3 and 4 are 
conceptually similar to "Cat I" and "Cat II" sour ces proposed for 
LOFAR calibration (Nii boer & NoordamlfeOO?!) . but here I sug- 
gest three sets rather than two. The exact partitioning of sources 
into sets determines the accuracy vs. computing cost trade-off. 



2.4.5. Comparative summary of approaches 

It may be interesting to compare the different approaches to a 
particular class of DDE, for instance pointing error. Pointing 
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errors introduce an fi-Jones as given by Eq. (0. To date, 
thre e relevant approaches have been proposed: pointing self- 
cal dBhatnagar et al.ll2004l) . peeling (Sect. 12.4.2b and differential 
gains (Sect. 12.4.3b . Of these, peeling is by far the best tested, 
since it is available with all 2GC software packages. Differential 
gains are available in MeqTrees; pointing selfcal is implemented 
in an experimental version of CASA (Bhatnagar priv. comm.), 
but is not publicly available at time of writing. This makes a 
quantitative comparison impossible, but the algorithms may be 
compared in principle. 

The peeling approach and differential gains are very similar 
in that they attempt to solve for the same phenomenological ef- 
fect: a direction-dependent complex gain term. In essence, peel- 
ing approximates a full-sky RIME as: 

V P9 = Gi p (X lpq + G 2p 0i 2pq + G 2 p(--)G" q )G2 q )G" q , 

where X, M is the model coherency of source s (typically 
a phase-shifted delta function, for a point source model, but 
Gaussian sources are also possible in e.g. NEWSTAR). Peeling 
consists of a least-squares solution for for one set of gains at a 
time (as in regular selfcal), followed by "temporary" subtraction 
of sources for which a solution has been obtained. Differential 
gains uses an equation like ( TToT l. First, a regular selfcal step is 
done to obtain G p solutions on short time/frequency scales. This 
is followed by a simultaneous least-squares solution for all the 
\E sp terms, on longer time/frequency scales. 

Peeling is subject to selfcal contamination at each stage 
of the process, due to the as-yet-unsolved-for contributions of 
fainter sources. This is especially severe when sources have 
comparable flux. Differential gains overcomes this by solving for 
all sources simultaneously. In principle, it should be possible to 
drive contamination arbitrarily low (and thus achieve the same 
result as differential gains) via several iterations of the peeling 
cycle, but this is both labour-intensive, and requires many passes 
through the data. 

Both approaches solve for per-antenna, per-direction gains, 
while overlooking the fact that physically, these are due to a 
single per-antenna pointing offset (and thus ignoring Eq. |9). 
Pointing selfcal tries to solve for the true offset itself. In ef- 
fect, it uses a RIME of the form of Eq. ||7), where the convo- 
lutional terms U p - TE p are the aperture illumination patterns, 
i.e. the Fourier transforms of the primary beams E p , and X are 
the full-sky model coherencies. At the heart of the algorithm is a 
clever minimization scheme, which essentially decomposes U p 
into first- and second-order terms of the pointing offsets 8l p , 6m p . 
This assumes that the primary beam has a functional form, and 
that it is (at least to zero-th order) Gaussian. 

The advantage of pointing selfcal is that a single per-antenna 
pointing offset is obtained, and that the entire model sky (in- 
cluding extended emission!), rather than discrete components, 
is used to constrain the solution. Peeling and differential gains 
solve for the total effective gain in each direction, and are less 
well-constrained by definition. On the other hand, the latter 
two approaches will happily absorb all unknown DDEs into the 
direction-dependent solution, while it is yet unclear to what ex- 
tent pointing selfcal is robust in the presence of other DDEs. 
The fact that the entire sky is used to constrain the solution also 
seems to be a double-edged sword. In particular, it is not clear 
how pointing selfcal is affected by having a bright source near a 
null or a sidelobe, where the primary beam is particularly poorly 
approximated by the functional form. 

In terms of performance, pointing selfcal should in princi- 
ple be the fastest method of the three, since it solves for the least 



number of unknowns, and also allows for the entire sky to be pre- 
dicted via an FFT Differential gains are slower, which is partly 
due to the use of DFTs for source prediction, although the true 
bottleneck is the far larger number of unknowns. Peeling, on the 
other hand, is I/O-bound due to the large number of data passes, 
which will usually make it the slowest of the lot. 

3. Conclusions 

Several authors have developed approaches to the DDE problem 
based on the RIME, using different (but mathematically equiv- 
alent) versions of the formalism. This paper has attempted to 
reformulate these using one consistent 2x2 formalism, and con- 
sider how these methods may be combined. 

A look at such DDEs as instrumental polarization (Sect. 12. U 
and differential Faraday rotation (Sect. \2.2.2\ suggests that the 
study of polarized signals is no longer a side issue of interest 
only to polarimetry per se. Proper calibration of the new crop of 
instruments requires that a full-polarization picture be consid- 
ered from the beginning. Fortunately, the RIME provides just 
such a picture, by recasting the signal in terms of 2 x 2 co- 
herency matrices rather than IQUV vectors. This allows com- 
plicated propagation effects to be described in terms of rigorous 
and straightforward matrix algebra, and builds valuable links be- 
tween one's physical and mathematical intuition. 
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